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ABSTRACT 

The  effect  of  a  two-stage  sampling  design  on  statistical  inference  is 
discussed.  A  definition  of  a  design  effect  is  given.  The  structure  of  design 
effects  for  a  class  of  statistics  is  investigated.  Results  have  both  a 
design-based  and  a  model-based  interpretation.  The  relation  between  design 
effects  for  multivariate  statistics  and  design  effects  for  univariate 
statistics  is  considered. 


AMS  (MOS)  Subject  Classifications:  62D05,  62H10 

Key  Words:  Design  Effect;  Model  Misspecif ication;  Two-stage  Sampling;  Finite 
Population;  Sample  Survey. 
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SIGNIFICANCE  AND  EXPLANATION 


In  sample  surveys,  the  design  effect  of  a  statistic  is  usually  defined  as 
the  ratio  of  its  true  variance  under  the  given  sample  design  to  its  variance 
had  the  sample  been  obtained  by  simple  random  sampling# 

Empirical  work  suggests  certain  patterns  for  design  effects  of  different 
types  of  statistics  under  different  designs  but  theoretical  jjork  explaining 
these  patterns  is  limited.  This  paper  obtains  general  theoretical  results  on 
the  structure  of  design  effects  for  a  broad  class  of^tatisirtt^under  a  two- 
stage  sampling  design.  In  particular,  it  discusses  the  relation  between 
design  effects  of  multivariate  and  of  univariate  statistics. 

This  relation  is  of  practical  interest  because  it  is  of  relevance  to  the 
imputation  of  standard  errors  for  multivariate  statistics  such  as  correlation 
coefficients  or  regression  coefficients  using  design  effects  of  univariate 
statistics.  The  latter  quantities  are  often  routinely  derived  on  completion 
of  the  survey.  The  former  may  be  difficult  to  compute  by  standard  procedures. 


of  software  or  degrees  of  freedom  limitations. 
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DESIGN  EFFECTS  OF  TWO-STAGE  SAMPLING 
C.  J.  Skinner 

1.  INTRODUCTION 

The  application  of  statistical  methods  such  as  regression  analysis  and  multivariate 
analysis  to  sample  survey  data  is  now  widespread.  Such  methods  typically  assume  that  the 
rows  of  an  n  x  q  data  matrix,  xn,  are  realizations  of  independent  and  identically 
distributed  (IID)  random  vectors.  A  general  question  may  therefore  be  raised  as  to  the 
validity  of  inference  procedures  which  make  this  assumption  when  the  data  is  derived  using 
a  complex  sample  design.  In  particular,  this  paper  is  concerned  with  the  effect  of  two- 
stage  sampling  on  the  estimation  of  functions  of  population  moments,  such  as  correlation 
coefficients. 

The  term  'design  effect*  was  originally  introduced  (Kish,  1965)  as  a  measure  of 
efficiency  for  comparing  sample  designs.  More  recently  (e.g.  Rao  and  Scott,  1981)  it  has 
also  been  used  as  a  measure  of  the  impact  of  a  sample  design  on  an  inference  procedure.  We 
shall  be  concerned  only  with  this  latter  concept. 

We  presume  a  basic  acquaintance  with  the  distinction  between  the  design-based  and  the 
model -baaed  approaches  to  survey-sampling  inference  (e.g.  Sarndal,  1978).  From  the  design- 
based  viewpoint  the  interpretation  of  'the  effect  of  two-stage  sampling*  is  clear.  The  IID 
assumption  corresponds  to  the  randomization  distribution  induced  in  xn  by  simple  random 
sampling  with  replacement  from  a  finite  population  (or  without  replacement  from  an  infinite 
population).  Two-stage  sampling  induces  a  different  distribution  in  xn  and  consequently 
perturbs  the  distribution  of  estimators  from  that  predicted  by  IID  theory. 

From  the  model-based  viewpoint  the  effect  of  the  sampling  design  on  inference  is  much 
less  clear.  The  model-based  approach  begins  by  specifying  a  model  distribution  for  the 
matrix  of  values,  x,  of  the  population  units.  Inference  then  proceeds  in  one  of  two 
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ways  : 


(A)  Inference  is  based  only  on  the  model  distribution  conditional  on  the  units 
actually  obtained  in  the  sample  and  irrespective  of  any  other  sample  that  might 
have  been  selected. 

(B)  inference  incorporates  both  the  model  distribution  and  the  randomization 
distribution  induced  by  the  sample  design. 

The  role  of  the  sample  design  in  model-based  inference  is  by  no  means  a  subject  of 
universal  agreement  (see,  for  example,  the  discussion  of  Royall  and  Cumberland,  1981). 
Sugden  and  smith  (1984)  specify  various  conditions  for  choosing  between  (A)  and  (B).  For 
example,  an  instance  when  it  may  be  inappropriate  to  ignore  the  sample  design  occurs  when 
sampling  on  the  dependent  variable  in  regression  analysis  (c.f.  Nathan  and  Holt,  1980).  In 
such  cases  the  design  has  a  direct  effect  on  inference. 

In  Section  2  we  adopt  procedure  (A).  In  this  case  the  effect  of  the  sasqple  design  is 
more  indirect.  For  example,  two-stage  sampling  presumes  that  the  population  is  divided 
into  clusters,  units  within  clusters  usually  tend  to  be  more  alike  than  units  in  different 
clusters.  This  implies  that  the  IID  assumption  for  xn  corresponds  to  an  inappropriate 
model  assumption.  Otoe  'effect  of  the  design*  is  therefore  really  the  effect  of  mis- 
specifying  the  model  (c.f.  Scott  and  Holt,  1982,  p.  850).  For  if  the  true  model  for 
is  in  fact  IID  then  two-stage  sampling  would  have  no  effect  under  procedure  (A). 

Conversely,  if  the  true  model  is  not  IID  and  we  happen  to  choose  the  same  sample  of  units 
by  (i)  simple  random  sampling  and  (ii)  two-stage  sampling  then  the  effect  on  inference  is 
identical  for  (i)  and  (ii). 

Our  approach  will  be  to  define  a  distribution  for  Xj,  which  has  both  a  design-based 
and  model-based  interpretation  and  then  to  obtain  results  which  may  be  interpreted  as 
respectively  design  effects  or  misspecification  effects.  Because  of  the  mathematical 
isomorphism  between  the  results  under  the  two  approaches  it  will  be  convenient  to  use  the 
single  term  design  effect.  He  maintain,  however,  that  this  effect  has  distinct  interpreta¬ 
tions  under  the  two  approaches. 


There  is  a  further  problem  from  the  model-based  viewpoint  with  the  effect  of  design  on 


statistical  methods  such  as  regression  analysis.  Suppose  we  take  a  two-stage  sanple  and 
decide  that  the  appropriate  model  allows  for  different  regression  relationships  in 
different  clusters.  It  may  be  argued  (e.g.  pfefferman  and  Nathan,  1981)  that  the  target 
parameters  of  interest  are  then  the  individual  cluster  regression  coefficients  rather  than 
any  overall  population  regression  coefficient.  He  shall  ignore  this  consideration  here  and 
assume  that  the  target  is  a  well  defined  population  parameter.  He  view  the  design  as  an 
arbitrary  selection  process  with  no  characteristic  of  substantive  interest  upon  which  we 
wish  to  'condition'  (c.f.  Kish  and  Frankel,  1974). 

He  now  introduce  our  basic  definition  of  'design  effect'.  He  take  xn  as  a  member  of 
the  infinite  sequence  {xn>n«1, 2, • • • }•  bet  *e  “  wo  n  **•  the  'baseline'  distribution  of 
Xj,  under  the  IID  assumption.  Let  n  b®  the  true  distribution  of  x„.  From  the 

design-based  viewpoint  ir.  is  the  randomisation  distribution  induced  by  the  complex 
sampling  design.  From  the  model-based  viewpoint,  assuming  procedure  (A)  above,  it  1  is  the 
true  model  distribution  of  x  'marginalised*  to  x,^. 

Definition  1.1;  Suppose  t„  -  t^x,,)  is  a  scalar  statistic  obeying  the  following  central 
limit  laws  as  n  ♦  «. 

n^  (tn-0o)  i  N<0,Oq)  under 
n^2  (tn-0.| )  ^  N(0,o^)  under 

Suppose  also  that  v0  n  -  vq^Ix,,)  is  consistent  for  <Jq  under  x„  and  converges  in 
probability  to  plim^  (vQ  R)  under  Then  the  design  effect  of  tn  is  defined  as 

deff(t  »*  ,v  )  -  o*/plim  (v  )  .  (1.1) 

n  I  n  1  Wj  u,n 

Remarks 

1.  The  traditional  definition  of  a  design  effect  (e.g.  Kish,  1965,  p.  265)  as  a 

2  2 

measure  of  design  efficiency  is  o^/cJq  in  the  above  notation.  Definition  1.1  is  more 
natural  as  a  measure  of  the  impact  of  the  design  on  estimation.  It  measures  the 
effect  of  acting  as  if  is  true  when  in  fact  *  1  is  true.  Note  that  deff^2 
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provides  a  multiplicative  adjustment  for  the  standard-error  estimate,  (n  vQjn) . 

2.  The  design  effect  will  usually  be  of  secondary  importance  if  tn  is  inconsistent 

under  that  is  if  is  not  the  target  parameter.  If  ir0  is  assumed  to  be  the 

true  distribution  then  tn  is  usually  chosen  such  that  0g  is  the  target  parameter. 
Then  -  Og  is  the  asymptotic  bias. 

3.  Definition  1.1  does  not  depend  on  *g,  except  as  a  heuristic  device  for 

deriving  Vo,n*  makes  this  definition  easier  to  use  from  the  model-based 

approach  than  the  definition  a^/o g. 

4.  Hie  design  effect  is  unity  when  Ug  *  tt y . 

5.  The  asymptotic  nature  of  the  definition  simplifies  results  but  is  not  essential. 

In  this  article  we  shall  be  interested  in  how  design  effects  depend  upon  the  survey 
design  and  upon  the  population.  He  adopt  a  theoretical  approach  as  opposed  to  the 
enpirical  approach  of,  for  example,  Kish  and  Frankel  (1974).  Hie  latter  approach  may  be 
more  realistic  but  lacks  general izability  because  of  the  enormous  range  of  possible 
statistics  and  population  structures.  The  theoretical  approach  must  make  strong 
assumptions  to  obtain  useful  results  but  the  extent  of  possible  generalization  should  be 
more  apparent.  Of  course  the  two  approaches  should  complement  each  other. 

He  shall  be  particularly  interested  in  the  relation  between  design  effect  of 
multivariate  statistics  and  design  effects  of  univariate  statistics.  Such  relations  are  of 
practical  interest  for  at  least  two  reasons.  Firstly,  the  survey  data  collection  agency 
may  publish  design  effects  for  univariate  statistics  but,  for  confidentiality  reasons,  may 
not  make  sufficient  survey  design  information  available  on  public  use  tapes  for  the  data 
analyst  to  estimate  standard  errors  in  the  usual  way.  Given  suitable  theoretical 
relations,  the  analyst  could  instead  impute  standard  error  for  multivariate  statistics 
using  the  published  univariate  design  effects.  Secondly,  even  if  the  analyst  has  available 
full  design  information  it  may  still  be  desirable  to  impute  standard  errors  because  of 
computer  software  availability  or  degrees  of  freedom  limitations. 
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In  Section  2  we  outline  the  basic  formal  framework  and  define  »0  and  *.,.  In 
Section  3  we  apply  Definition  1.1  to  a  general  class  of  estimators  under  the  given  w  1  and 
derive  results  on  the  form  of  design  effects  for  the  case  of  equal  cluster  si2es.  An 
example  is  given  in  Section  4.  The  case  of  unequal  cluster  sizes  Is  considered  briefly  in 
Section  5  and  the  implications  of  the  results  are  discussed  in  Section  6. 


2.  FRAMEWORK  AND  ASSUMPTIONS 

Consider  a  finite  population,  u,  partitioned  into  K  clusters.  Let  the  unit  in 
the  ith  cluster  be  labelled  (i,j)  for  i  «  1,...,K,  j  *  1,...,M^  where  MA  is  the  size 
of  the  ith  cluster.  A  sample  is  a  subset,  S,  of  U  -  {(i,j)  »  i  -  1,...,K, 
j  «  1,...,M.}.  We  suppose  that  the  sample  is  selected  in  such  a  way  that  each  subset  S 
of  u  has  a  known  probability,  p(S),  of  selection.  Conventionally  the  sample  is  chosen 
in  two  stages:  first,  a  sample  of  clusters  is  selected  and  then  subsamples  are  selected 
within  each  of  the  selected  clusters.  Without  loss  of  generality  we  write  the  actual 
sample  obtained  as  s  -  {(i,j)  ;  i  -  1,...,k,  j  «  1,...,m^>.  The  sizes  of  the  sample  and 
population  are  respectively: 

k  K 

"*I»i  »  »  *  I  Mi  * 

1  1 

We  suppose  that  a  qxl  vector  x^j  is  associated  with  unit  (i,j)  in  U  and  let 

x  -  (x11,...,xkm^)t  f  x„  -  (*iV*#xkn^)T 

be  respectively  the  N  x  q  matrix  of  finite  population  values  and  n  x  q  observed  data 
matrix  discussed  in  Section  1,  where  T  denotes  transpose. 

For  simplicity  we  make  the  following  assumption  in  Sections  2-4. 

Assumption  1 :  There  is  no  auxiliary  information  to  distinguish  the  clusters,  in  particular 
the  cluster  sizes  are  equal:  M^  -  M,  i  -  1,...,K. 

We  consider  the  case  of  unequal  M^  in  Section  5.  We  now  define  the  distributions 
s0  and  w1#  of  Xj,. 


Definition  2.1:  The  true  distribution  of  x^  denoted  by  »1  (Xjj),  obeys  the  following 


conditions : 

(1)  conditional  on  (random)  distribution  functions,  F1,...,Fk,  the  x^j  are 
mutually  independent  and 

xij  1  ^  ^1  1  “  1» • • • j  ■  1 , • • • »m^  , 

(ii)  are  I  ID. 

Remark;  In  (11)  the  F^  are  functions  on  and  so  the  distribution  of  each  F^  is 

infinite  dimensional  as  in  the  theory  of  stochastic  processes.  More  precistely  we  might 
follow  Ferguson  (1974)  and  let  4  be  a  set  of  distribution  functions  on  F 3,  6  be  a 
sigma-algebra  of  subsets  of  4  and  n  be  a  probability  measure  on  (4,9).  Then, 
equivalently  to  (ii),  we  assume  (Fj,...,Fjc)  is  an  outcome  of  the  product  space 

<4,e,mk. 

Design-based  Interpretation  of  w . . 

This  distribution  can  be  viewed  as  the  randomisation  distribution  of  xn  induced  by 
simple  random  sampling  with  replacement  at  both  stages.  Let  G0  be  the  ’empirical* 
distribution  function  of  x  in  the  0th  cluster,  i.e.  Ga  assigns  probability  mass  M-1 
to  each  point  xa1,...,xaM.  Let  4  -  {G,,,...,GK}  and  let  n  assign  probability  IT1  to 
each  outcome  Ga.  Hence  each  is  equal  to  a  randomly  chosen  Ga  • 

Model-based  interpretation  of  ir^. 

Suppose  x  is  a  realization  of  the  Nxp  random  matrix,  X,  with  prior 
distribution  ti1(x)  obtained  by  extending  Definition  2.1  by  substituting  K  for  k  and 
M  for  n^.  Suppose  that  the  sample  design,  p(S),  is  non-informative  in  the  sense  that 
S  and  X  are  independent.  Then  ir^x^  is  the  appropriate  distribution  of  xn  for 
model-based  inference  conditional  on  S  -  s  (Sugden  and  Smith,  1984).  This  is  inference 
procedure  (A)  referred  to  in  Section  1. 

The  distribution  n , (x)  seems  both  a  natural  and  a  general  non-parametric  model  for 
expressing  the  symmetry  between  clusters  and  between  units  within  clusters.  A  simple 
example  is  the  one-way  random  effects  model  (e.g.  Scott  and  Smith,  1969).  Here  4  is  a 
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location  family,  *  «  (G(x-$) j  9  e  R^}  where  G  it  a  given  distribution  function  and  n 
defines  a  prior  distribution  for  9.  For  example,  ♦  may  correspond  to  the  normal  family 
{Nq(f  i  9  e  Rq}  and  H  may  correspond  to  9  ~  ^(UtAb)*  Other  examples  with  scale 
parameters  and  higher-order  cumulants  varying  between  clusters  are  given  by  Leonard  (1975) 
and  Skinner  (1981)  respectively. 

The  distribution  x^(x)  is  also  a  special  case  of  the  two-btage  exchangability/random 
permutation  model  of  Bellhouse  et  al  (1977).  Their  model  is  more  general  because,  in 
particular,  it  allows  for  negative  intra-cluster  correlation  as  does  the  similar  model  of 
Royall  (1976).  However,  it  is  less  interpretable  and,  for  example,  would  not  permit 
Theorem  3.6.,  one  of  our  main  results.  Furthermore,  if  we  add  the  assumption  that  x  is 
part  of  a  doubly  infinite  sequence  {x^  j  i  «  1,2, ...,j  -  1,2,...}  such  that  (i)  and  (ii) 
hold  for  any  K  and  M  then  we  would  conjecture  that  this  two-stage  exchangeability  model 
could  be  represented  by  it  ^ .  (Aldous,  1981,  proves  a  stronger  result  for  a  crossed 
rather  than  a  nested  doubly  infinite  array.) 

Definition  2.2s  Hie  baseline  distribution  of  x,^,  denote',  by  *0(1^),  obeys  the 
following  condition: 

(i)  X1 1  * • • • T^km^  are  X1D. 

Design-based  interpretation  of  itq. 

This  is  the  randomization  distribution  induced  in  Xj,  by  simple  random  sampling  with 
replacement  from  the  whole  finite  population. 

Model-based  interpretation  of  wQ. 

This  is  the  'textbook1  IID  assumption  referred  to  in  Section  1. 

We  assume  the  existence  of  the  first  two  moments  of  Xj,  under  both  n0  and  ir  t  and 

write 
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Cv 


where  x  ■  n”  J  x,  .  end  g  is  a  given  function  g  s  tfl  *  R.  This  class  includes,  in 

s  3 

particular,  functions  of  second  moments  such  as  correlation  coefficients  and  linear 
regression  coefficients  by  defining  x  to  include  squares  and  products  of  the  'raw'  survey 
variables  (see,  for  example.  Section  4  and  Krewski  and  Rao,  1981).  For  simplicity  we 
assume  g  is  scalar-valued  but  results  extend  straightforwardly  to  vector-valued  g.  We 
assume  the  target  parameter  is 

01  =  g(y)  (3.2) 

where  y  is  defined  in  (2.1).  For  example,  if  tn  is  the  sample  correlation  coefficient 
then  8  .j  is  the  finite  population  correlation  coefficient  under  the  design-based 
interpretation  or  the  'super-population'  correlation  coefficient  under  the  model-based 
interpretation  of  * 1 . 

The  main  aim  of  this  section  is  to  apply  Definition  1.1  to  tn  in  (3.1)  for  the 
model  irj  of  Definition  2.1.  This  will  be  done  in  Theorem  3.5  but  first  we  need  to 
establish  the  conditions  of  Definition  1.1  for  tn  and  n1.  We  make  use  of  the  following 


Condition  C1(ir,):  For  some  e  >  0,  E  |  (x,  .  -  y)„|^  e  exists  for  l  =  1,...,q,  where 
- •—  s1  13  l 


( •  ) ^  denotes  the  Ith  element  of  a  vector. 


Condition  C2(ir<);  The  function  g  admits  continuous  partial  derivatives  at  y  at  least 
one  of  which  does  not  vanish  at  y • 

Conditions  Cl(irg)  and  C2(xg)  are  defined  analogously  with  w0  and 
replacing  and  y  respectively. 

Hie  corollary  of  the  following  lemma  establishes  one  conditon  of  Definition  1.1  and 
gives  the  numerator  of  (1.1). 

Lemma  3. 1 


If  Clfir.j)  holds  then  under  as  n  +  • 

n/z  (x  -y)  £  N  [0,  (I  +  (m*-1 ) DO] 


n 


(3.3) 


“1  _i  k  2 

where  T  -  ,  m*  -  lim  n  I  mi* 


-1  mi 

Proof  1  Let  ■  kn  £  (x.  .-y),  Then  (z  ,z....}  is  a  sequence  of  independent  random 

j-1  3 
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•V>‘-  vV.' 


vectors  with 


B,  («t)  -  0  ,  (z^*)  -  k2n"2«1(Iq  +  (m^-1) Dfi  , 


(lK*i>il2+e  -  (kn‘1>2+€%1lji1  ‘V’i1 


2+e 


<  m2+eE  | (x. ,-y) | 2+6  by  Minkowski's  inequality 

1  *4  *0  - 


and  since  n  >  k 


0(1)  from  C1(ft)  and  since  <  M 


Hence  by  using  a  central  limit  theorem  such  as  lemma  3.1  of  Krewski  and  Rao  (1981) 

k 


k/2  (k-1  l  t  )  h  (0,5 2> 

4  e  1  'I  * 


i-1 


where  a*  -  lim  k"*'  7  *(*.*.*) 


n*M*  i-1 


-  lim  kn-1(l  +  (n-1Im?  -  i)r)B  . 
<J  i 


•Bie  result  follows  since  x-y  -  k  ^z^. 

From  standard  asymptotic  theory  we  obtain  the  following: 

Corollary  3.2 

If  CKw.,)  and  C2(«<|)  hold  then  under  as  n  ♦  «• 

n/2(t  -6,)  hlO,  (1  +  (m*-1)p  )o2]  (3.4) 

n  1  9  9 

where  a2  »  V  (y)TflV  (y)  »  V  (y)  -  3g(y)/3y 

9  9  9  9 

(3.5) 

P„  -  V  lulV? Jv)/o2  • 

9  9  »  9  9 

Remarks 

1.  Under  the  model-based  approach  Corollary  3.2  would  also  hold  if  0 1  -  g(xN)  where 
7  -  N_1  J  x,.  provided  n/H  +0  as  n  +  •.  Fuller  (1975,  Appendix  A)  gives  a  result 

"  ii  12 
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corresponding  to  Lenoa  3*1  for  xn  ~  ^  where  n/N  ♦  f  +  0 
2%  m*  exists  because  the  are  bounded,  1  <  <  M. 


are  equal  the  are  IID  and  by  Khinchlne's  version  of  the  Weak  law  of  Large  Numbers 

*1 

I'-1  +  1 


k“ 1  Eu,  ♦’  0  even  without  C1(ir1).  If  the  are  unequal  then  the  application  of 


Minkowski's  inequality  as  in  Lemma  3.1  and  the  use  of  C1(i .,)  together  with  the  fact  that 
<  M  implies  that  the  ( 1+  -j  e)t^‘  moment  of  the  absolute  value  of  each  eleswnt  of  u^ 

*  i 

is  bounded  uniformly  in  i  and  so  (e.g.  Krewski  and  Rao,  1981,  Lemma  3.2)  k_1  £u1  ♦  0. 

w 

•the  result  then  follows  by  noting  that  (n-1)-1k  is  bounded  and  that  (x  -p)  -♦  0  from 


Lemma  3.1. 


-  ’1 


From  the  assumed  continuity  of  V  and  the  fact  that  x  ♦  u  we  obtain: 

g  n 

Corollary  3.4 

If  C1(wj)  and  C2(«j)  hold  then  as  n  +  » 


*1  2 

vg«0,n  ♦  °g  * 

We  are  now  in  a  position  to  derive  our  main  result. 
Theorem  3.5 

If  C1<w0),  C1<wi)*  C2(w0),  C2(*,)  hold  then 


(3.11) 


deff(tn>  vg>0/n)  -  1  +  (m*-1)pg  .  (3.12) 

Proof:  the  condition  of  Definition  1.1  hold  from  Corollaries  3.2  and  3.4  and  by  noting 
that  w0  is  a  special  case  of  a  model  of  form  w1  with  F1  «...«  Fk.  The  expression  in 
(3.12)  is  obtained  from  (3.4)  and  (3.11). 

Remarks 


1.  If  ^  ■  then  m*  -  m  and  the  expression  in  Theorem  3.5  has  the  familiar 

form  of  the  design  effect  of  a  mean  (Kish,  1965).  The  IID-based  estimator  vg ^ Q ^ n  under¬ 
estimates  the  variance  of  tn  by  an  amount  which  depends  on  the  subsample  size  m  and  the 
intra-cluster  correlation  p. 

2.  If  the  are  unequal  note  that 

m*  -  lim[m  +  Km^-m^/n]  >  lim  m  where  m  -  n/k 
n-**»  n-»« 

Hence  expression  (3.12)  tends  to  be  greater  than  the  commonly  used  expression  1  ♦  (m-Dp^ 
(Kish,  1965).  Our  expression  for  m*  is  the  limit  of  expressions  appearing  in  Campbell 


(1977)  and  Rao  and  Scott  (1981) 


3.  Referring  to  Remark  2  under  Definition  1 . 1  the  asymptotic  relative  bias  is  zero  under 
the  design-based  interpretation  since  8q  -  6 1  «  g(xN>  and  should  be  negligible  under  the 
model-based  interpretation  because  tn  -  gtx^)  ^  0  under  either  x  =  or  n  * 

For  the  remainder  of  this  section  we  examine  the  quantity  in  (3.12).  He  may 

view  pg  either  as  an  intra-cluster  correlation  of  w^j  as  in  (3.7)  or  as  a  measure  of 

T 

homogeneity  of  the  7^(g)  as  (3*8).  For  example,  if  q  «  1  and  g  is  the  identity 

function  then  t_  is  the  sample  mean  x  and  p„  is  the  usual  intra-cluster  correlation 

of  the  x^j  which  is  a  measure  of  homogeneity  of  the  means  in  the  different  clusters. 

In  general,  however,  neither  (3.7)  or  (3.8)  are  very  easy  to  interpret  because  of  their 

T 

dependence  on  the  rather  artificial  quantities  w^j  and  Vg(p)  g^«  In  order  to  obtain  a 
more  interpretable  expression  for  pg  we  impose  a  further  condition  on  the  distribution 
ir  1 .  This  condition  is  strictly  only  applicable  under  the  model-based  approach. 

Referring  to  Definition  2.1  let  F  -  (Fj_)  be  the  marginal  distribution  of  x^j. 

He  suppose  each  is  a  mixture. 

Condition  C3;  F^  ■  ( 1-d )F  +  6D^  i  -  1,...,k 

where  0  <  6  <  1  and  , .  •  • .D^  are  IID  distribution  functions  with  E(Di)  -  F. 

One  extreme  6  «  0  then  corresponds  to  x0  whilst  the  other  extreme  6  *  1  Imposes 
no  further  structure  on  He  shall  suppose  that  6  is  small  which  we  suggest  is  a 

natural  non-parametrlc  way  of  asserting  that  there  is  low  intra-cluster  correlation.  This 
assumption  may  not  be  unreasonable  in,  for  example,  large-scale  sample  surveys  where  the 
clusters  are  geographical  areas.  In  such  surveys  the  intra-cluster  correlation  of 
variables  is  usually  low  (say  <  0.1)  by  design,  even  though  the  design  effect  may  be  non- 
negligible  because  of  the  value  of  m*. 

He  need  further  regularity  conditions. 

Condition  C4:  The  matrix  Hg  of  second  partial  derivatives  of  g  exists  in  a  neighbor¬ 
hood  of  p  and 

var  [<p(D. >-g)TH  [p+e<ii(D.  )-|i)l  d»<D.  )-p)l 
1  g  i  i 

is  bounded  as  e  +  0  where  we  use  the  functional  notation  gfD^ )  •  /xdD^(x)  so  that 
Mi  *  U < ) ,  ti  *  U(P)* 
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The  following  theorem  gives  an  alternative  approximate  expression  for  when  the 

intra-cluster  correlation  ia  low. 

Theorem  3.6. 

If  C3  and  C4  hold  then  as  640 

P  -  var  [g(p4)]/ver  (w..)  ♦  0(63)  -  0(62) 

g  i  13 

where  pj  and  w^j  are  defined  in  (3.9)  and  (3.6). 

Proofs  Consider  the  Taylor  Series  expansion 

9(Pa)  ■  g(p)  +  ^(pHUj-ll)  ♦  -J  (Pi-p)IH9(W*)(Bi-U>  (3.13) 

where  p*  -  (1-9)p  ♦  6pa  and  9  is  a  scalar,  0  <  9  <  1.  How 

Pt-P  ”  p(F±)  -  p(P)  -  p(Fj-F> 

-  p[6(D1-p)]  from  C3 

-  6(p(Di)-p]  .  (3.14) 

The  result  follows  by  eubstituting  (3.14)  into  (3.13)  and  using  (3.8)  and  C4  with  e  “  <16. 

The  quantity  g(u1)  is  the  cluster  'version'  of  tg  ■  g(xJl)  and  6 1  •  g(p).  For 
example,  if  tn  is  the  sample  correlation  coefficient  and  61  is  the  population 
correlation  coefficient  then  gfp^)  is  the  correlation  coefficient  in  the  1th  cluster.  A 
specific  example  is  given  in  Section  4  where  tn  is  the  sample  variance  and  g(p^)  is  the 
variance  in  the  1th  cluster.  The  quantity  var(wi j)  does  not  depend  on  the  clustering  in 
the  population  (in  terms  of  C3  it  depends  on  F  but  not  on  6  or  D^)  and  may  be  viewed 
as  a  standardizing  quantity.  Hence  Theorem  3.6  permits  pg  to  be  interpreted  as  a  measure 
of  homogeneity  of  the  quantities  g(p^),  providing  the  overall  level  of  intra-cluster 
correlation  is  'low'.  Combining  with  Theorem  3.5  suggests,  for  example,  that  the  design 
effect  of  a  sample  correlation  coefficient  is  mainly  determined  by  the  difference  between 
the  correlation  coefficients  within  clusters. 


4.  AM  EXAMPLE:  DESIGN  EFFECT  OF  A  SAMPLE  VARIANCE 


The  low-6  approximation  in  Theorem  3.6  is  examined  here  explicitly  for  the  case  where 

fcn  “  n_1  l  ^ij-V2  '  yn  ”  n_1  l  yij  * 

8  8 

Me  may  write  tn  in  the  form  of  (3.1)  by  letting  q  -  2,  xAj  -  <yij*y2j>T»  9l(xi*x2>Tl  “ 
x2  -  x*.  Hence  xR  ■  (yn,  n  1  l  y2^),  9(xn>  "  n  1  I  “  t«*  p°llo*in9  (2.1)  and 

(3.9)  define  the  within-cluster  and  overall  moments  by 

“i  ■  (wyi'uyi  +  °yi)T  '  u  ■  (V*y  +  ^  * 

2  2  2  2 

Then  g(yi>  -  +  a^)  -  is  the  within-cluster  variance  corresponding  to  the 

sample  variance  tn  and  the  population  variance  g(u)  “  ay.  Also  V^(y )  «  (-2py,  1)  obeys 
C2(«1)  and  frcmt  (3.6),  up  to  an  additive  constant 

wij  “  frij  “  V2  • 

The  IID-based  variance  estimate  of  1^  given  by  (3.10)  is 

vg,0,n  “  (“-D"1  I  (-tj*  -  «*)2  *»here  wi;J*  -  (y±j  -  7n>2.  w*  -  V 

8 

The  low-6  approximation  to  pg  given  by  Theorem  3*6  is 

pg+  “  V*rw1^(‘,i)1/Varv1(-ij)  "  var»1  ^ffyi^var»1^yij_*Jy^2 

which  may  be  compared  with  the  expression  from  (3.8) 

Pg  “  «,1,#y l  +  <*,yi”py>2^/var*1(ylj_,jy^2  ' 

Hence  we  may  write 

p  +  <p<p+  +  2(np„+)/a  +  n 

9  9  9  9 

where  n  -  var  )2]/var  I <y . 2] • 

y*  y  xj  y 

Define  the  between-cluster  and  total  coefficients  of  fcurtosis  by 

TB  "  Ex1(,1yi-l,y,4/(vars1(,1yi,]2  ”  3'  Y  ”  2  _  3' 

Let  var ( Uy ^ ) /var ( y ^ j )  be  the  conventional  intra-cluster  correlation  of  y^j.  Then 


(2nB) 

(2+Y) 


(4.1) 


Hence  if  pg  is  small  then  n  will  be  very  small  unless  y  is  very  sskall  (for  example 
y  -  -1.2  for  the  very  platykurtic  uniform  distribution)  or  yB  is  very  large  (for  example 
yB  ■  6  for  the  very  leptokurtic  exponential  distribution).  Thus  if  pg  is  small  and 
there  is  reasonable  dispersion  amongst  the  oj  then  p^  “  p^  should  be  a  fair 
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2  2  4 

approximation .  In  tarns  of  6,  both  yyi  -  uy  and  ara  of  (6 ), 

and  p^.  ara  of  0(S2)  whilst  n  la  of  0(8*). 


5.  UNEQUAL  CLUSTER  SIZES 

The  rasulta  in  Sections  3  and  4  were  based  on  the  assusiption  of  equal  cluster  aisaa  • 

If  the  Mi  are  unequal  and  Definition  2*1  (which  does  not  involve  the  M^)  still  applies 
then  these  results  will  still  hold  (provided  {mj,m2,..«}  is  a  fixed  bounded  sequence). 

From  the  design-based  viewpoint*  Definition  2.1  still  holds  under  simple  random 
sampling  with  replacement  at  both  stages  where  the  m^  are  fixed  and  do  not  depend  on 
the  M^. 

From  the  aodel-based  viewpoint.  Definition  2.1  reaains  appropriate  if  the  withln- 
cluster  distributions  F^  do  not  dspend  on  the  M^.  It  does  not  matter  here  if  the 
design  p(S)  is  dependent  on  the  M^  as  for  example  in  probability  proportional  to  sise 
sampling. 

In  general  F^  nay  depend  on  and  the  results  of  Section  3  will  not  hold.  For 

example,  xR  may  no  longer  even  be  a  consistent  estimator  of  p.  A  general  discussion  of 

inference  under  models  for  populations  with  unequal  sise  clusters  is  given  by  Sundberg 

(1983).  He  suggest,  however,  that  within  strata,  and  in  particular  within  sise  strata,  our 

-1  “i 

results  should  hold  at  least  approximately.  In  fact,  plots  of  x^  -  m^  J  x^,  and 

-1  *i  -  2  j"1 

m.  J  (x.  -x  )  against  M^  for  various  variables  x  and  data  sets  in  Skinner  (1982) 

j-1  3 

suggested  little  relation  between  F^  and  M^ . 


6.  DISCUSSION 

Under  given  conditions,  in  particular  when  the  number  of  sampled  clusters  is  largs, 
the  design  effect  of  two-stage  sampling  was  shown  in  theorem  3.5  to  take  the  familiar  form, 
1  +  (m-1)p,  for  a  broad  class  of  statistics.  This  result  has  an  interpretation  both  from 
the  design-based  viewpoint  in  terms  of  with  replacement  sampling  and  also  from  a  model- 


-16- 


based  viewpoint  in  terms  of  a  fairly  9eneral  non-parametric  model  for  a  clustered 
population. 

For  linear  statistics,  such  as  the  sample  mean,  p  may  be  interpreted  as  a  measure  of 
homogeneity  of  corresponding  within  cluster  quantities,  such  as  cluster  means.  For  non¬ 
linear  statistics,  such  as  the  sample  correlation  coefficient,  provided  the  overall  level 
of  intracluster  correlation  is  not  high,  it  was  shown  in  Theorem  3.6  that  p  may  also  be 
interpreted  as  a  measure  of  homogeneity  of  corresponding  within  cluster  quantities,  such  as 
cluster  correlation  coefficients. 

These  results  have  rather  negative  implications  for  the  existence  of  relations  between 
design  effects  of  multivariate  and  of  univariate  statistics  as  discussed  at  the  end  of 
Section  1.  In  general  we  conclude  no  necessary  theoretical  relation  need  hold.  For 
example,  the  design  effect  of  a  correlation  coefficient,  being  determined  mainly  by  the 
heterogeneity  of  cluster  correlations,  has  in  general  no  necessary  relation  with  the  design 
effects  of  the  means  of  the  two  variables,  which  are  determined  by  the  heterogeneity  of  the 
cluster  means.  Our  conclusion  agrees  with  that  of  Rao  and  Scott  (1981)  on  the  design 
effect  involved  in  testing  independence  in  a  bivariate  contingency  table.  They  state  that 
'ideally  we  would  like  an  approximation  ...  based  on  the  marginal  design  effects'  (that  is 
the  univariate  design  effects)  but  'such  an  approximation  does  not  seem  possible  in 
theory' . 

Theoretical  relations  can  be  derived  under  restricted  assumptions  but  such  results  can 
be  misleading.  For  example,  a  regression  model  of  y  on  z  with  errors  correlated  within 
clusters  but  regression  slopes  B  constant  across  clusters  is  considered  by  Campbell 
(1977)  and  Scott  and  Holt  (1982).  They  obtain  the  1  ♦  (m-1)p  result  for  the  least- 
squares  estimator  of  the  slope  and  show  that  p  -  PzPe  where  pz  and  pe  are  the  intra¬ 
cluster  correlations  of  z  and  of  the  residual  e  »  y-8z  respectively.  Now  if  both  pz 
and  pe  are  small  then  p  is  very  small  which  the  authors  take  to  correpond  to  Kish  and 
Frankel's  (1974)  empirical  observation  that  'design  effects  for  complex  statistics  tend  to 
be  less  than  those  for  means  of  the  same  variable'.  However,  this  approach  effectively 
assumes  away  the  dominating  0(62)  term  in  Theorem  3.6  determined  by  the  dispersion- 


batman  eluatar  regression  coefficients  and  juat  obtaina  tha  0(S4>  tarai  analogous  to  n 
in  (4.1).  Hanca  we  suggest  tha  above  formula  could  draatically  underestimate  tha  true 
design  affect.  Other  examples  of  tha  application  of  Thaoraaa  3.5  and  3.6  for  specific 
statiatics  and  under  restricted  assumptions  are  given  in  Skinner  (1982). 

Rao  and  Scott  (1981)*  following  on  from  their  statement  above,  auggaat  that  'it  may  be 
possible  to  find  empirically-baaed  approximations  that  work  wall  in  practice ' .  In  another 
context,  for  example*  Bebbington  and  Smith  (1977)  auggaat  an  empirical  relation  between  the 
design  effect  of  a  correlation  coefficient  and  the  minimum  of  the  design  effects  of  the 
corresponding  means.  The  derivation  of  such  empirical  'laws'*  whilst  potentially  useful, 
is  no  easy  project  without  guidance  from  theory*  given  the  infinite  range  of  possible 
statistics*  designs  and  population  structures. 
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